Article 3221

Title of the article

EMD-based method to improve the efficiency of speech/pause segmentation 

Authors

Alan K. Alimuradov, Candidate of engineering sciences, associate professor of the sub-department of radio engineering and radioelectronic systems, director of the student research and production business incubator, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: alansapfir@yandex.ru
Aleksandr Yu. Tychkov, Doctor of engineering sciences, head of the sub-department of radio engineering and radioelectronic systems, deputy director of the Research Institute for Basic and Applied Studies, Penza State University (40 Krasnayа street, Penza, Russia), E-mail: tychkov-a@mail.ru
Petr P. Churakov, Doctor of engineering sciences, professor, professor of the sub-department of information and measuring technology and metrology, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: churakov-pp@mail.ru
Aleksey V. Ageykin, Assistant of the sub-department of microbiology, epidemiology and infectious diseases, Medical Institute, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: ageykinav@yandex.ru
Andrey V. Kuz'min, Doctor of engineering sciences, associate professor, professor of the sub-department of information and computing systems, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: a.v.kuzmin@pnzgu.ru
Maksim A. Mitrokhin, Doctor of engineering sciences, associate professor, head of the sub-department of computing technology, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: mmax83@mail.ru
Igor' A. Chernov, Student, Penza State University (40 Krasnaya street, Penza, Russia), E-mail: igorchernov999@mail.ru 

Index UDK

004.934 

DOI

10.21685/2072-3059-2021-2-3 

Abstract

Background. Speech/pause segmentation is one of the most important tasks in speech applications being accurate detection of the boundaries of the beginning and the end of voiced and unvoiced speech, and pauses. This is especially important both when analyzing distribution speed, acceleration, and entropy of voiced and unvoiced speech sections, and pauses, and analyzing the average duration of pauses. The aim of the work is to improve the efficiency of speech/pause segmentation based on the method of empirical mode decomposition. Materials and methods. A unique technology for adaptive decomposition of non-stationary signals, namely, the improved complete ensemble empirical mode decomposition with adaptive noise, has been used in the work. The software implementation of the method was performed in ©MATLAB (MathWorks) mathematical modeling environment. Results. A decomposition-based method has been developed to be used at the preprocessing stage of the original speech signals to form a set of new investigated signals containing the most reliable information about the boundaries of the beginning and the end of the voiced and unvoiced speech, and pauses. The research to assess the influence of the decomposition method, and the duration of the studied signal fragments on the efficiency of speech/pause segmentation has been done. We have used the methods based on the analysis of zerocrossing rate, short-term energy, and one-dimensional Mahalanobis distance. Conclusions. Based on the research results, it was found that the proposed method provides an increase in the efficiency of segmentation of voiced and unvoiced speech sections: by 13.96% for the method based on the analysis of zero-crossing rate; by 8.24% for the method based on the analysis of short-term energy; by 5.72% for the method based on the combined analysis of zero-crossing rate and short-term energy; by 17.85% for the method based on the analysis of one-dimensional Mahalanobis distance. 

Key words

speech signal processing, speech segmentation, voiced and unvoiced speech, empirical mode decomposition 

Download PDF
References

1. Martin A., Charlet D., Mauuary L. Robust speech/non-speech detection using LDA applied to MFCC. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) (ICASSP2001) (May 7–11, 2001). Salt Lake City, UT, USA. 2001;1:237–240.
2. Hlavnička J., Čmejla R., Tykalová T., Šonka K., Růžička E., Rusz J. Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder. Scientific Reports. 2017;7(12):13.
3. Atal B., Rabiner L.R. A pattern recognition approach to voiced unvoiced-silence classification with applications to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 1976;24(3):201–212.
4. Huang, X., Acero A., Hon H.-W. Spoken Language Processing. Guide to Algorithms and System Developmen. New Jersey: Prentice Hall, 2001:980.
5. Childers D.G., Hand M., Larar J.M. Silent and voiced/unvoied/ mixed excitation (fourway), classification of speech. IEEE Transaction on ASSP. 1989;37(11):1771–1774.
6. Duda R.O., Hart P.E., Strok D.G. Pattern Classification. 2nd ed. New Jersey: A Wiley-Interscience Publ. John Wiley & Sons, Inc., 2001:688.
7. Alimuradov A.K., Tychkov A.Yu., Churakov P.P., Sultanov B.V. A method for determining formatted speech intelligibility for assessing the psychoemotional state of control system operators with a high degree of responsibility. Izmerenie. Monitoring. Upravlenie. Kontrol' = Measurement. Monitoring. Management. Control. 2019;4(30):58–69. (In Russ.)
8. Alimuradov A.K., Tychkov A.Yu., Churakov P.P., Artamonov D.V. Noise-resistant algorithm for determining the prosodic characteristics of speech signals for systems assessing the psychoemotional state of a person. Izvestiya vysshikh uchebnykh zavedeniy. Povolzhskiy region. Tekhnicheskie nauki = University proceedings. Volga region. Engineering sciences. 2019;3(51):3–16. (In Russ.)
9. Alimuradov A.K., Tychkov A.Yu., Churakov P.P. Assessment of the psychoemotional state of a person based on the decomposition into empirical modes and cepstral analysis of speech signals. Vestnik Penzenskogo gosudarstvennogo universiteta = Bulletin of Penza State University. 2018;2:89–95. (In Russ.)
10. Huang N.E., Zheng Sh., Steven R.L. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. 1998;A454:903–995.
11. Zhaohua W., Huang N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis. 2009;1(1):1–41.
12. Yeh J.-R., Shieh J.-S., Huang N.E. Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in Adaptive Data Analysis. 2010;2(2):135–156.
13. Torres M.E., Colominas M.A., Schlotthauer G., Flandrin P. A complete Ensemble Empirical Mode decomposition with adaptive noise. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-11) (May 22–27, 2011). Prague, Czech Republic, 2011:4144–4147.
14. Colominasa M.A., Schlotthauera G., Torres M.E. Improved complete ensemble EMD: a suitable tool for biomedical signal processing. Biomed. Signal Proces. 2014;14:19–29.
15. Greenwood M.A., Kinghorn A. SUVing: automatic silence/unvoiced/voiced classification of speech. Undergraduate Coursework, Department of Computer Science, The University of Sheffield, UK, 1999:4.
16. Saha G., Chakroborty S., Senapat S. A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. Eleventh National Conference on Communications (NCC-2005) (Jan. 28–30, 2005). Kharagpur, India, 2005:51–61.
17. Alimuradov A.K., Fokina E.A., Zhurina A.E. Studying the influence of the duration of the analyzed speech signals on the frequency-selective properties of the decomposition into empirical modes. Novye informatsionnye tekhnologii i sistemy: sb. nauch. st. XVI Mezhdunar. nauch.-tekhn. konf. (g. Penza, 27 – 29 noyabrya 2019 g.) = New information technologies and systems: proceedings of the 16th International scientific and technical conference (Penza, November 27-29, 2019). Penza: Izd-vo PGU, 2019:201–205. (In Russ.)
18. Alimuradov A.K., Churakov P.P., Tychkov A.Yu., Artemov I.I., Kuzmin A.V. Improvement of the Efficiency of Voice Control Based on the Complementary Ensemble Empirical Mode Decomposition. 2016 International Siberian Conference on Control and Communications (SIBCON 2016) (May 12–14, 2016). Moscow, Russia, 2016:6.
19. National University of Entre Ríos. The Laboratory of Signals and Nonlinear Dynamics, Faculty of Engineering. Available at: http://www.bioingenieria.edu.ar/grupos/ldnlys. (accessed 01.05.2021).

 

Дата создания: 20.09.2021 11:47
Дата обновления: 20.09.2021 12:05